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SYSTEM AND METHOD OF SERVICING READ REQUESTS FROM A COMMON 

MIRROR 



5 BACK6ROTJND OF THE INVENTION 

1. Technical Field: 

The present invention is directed to storage systems. 
More specifically, the present invention is directed to a 
10 system and method of servicing "read requests from a common 
mirror . 

2. Description of Related Art: 

Most computer systems are made up of at least one 

15 processor and one physical storage system. The processor 
processes, stores and retrieves data from the physical 
storage system under the guidance of ail application program. 

Application programs generally run atop an operating 
system. Among the many tasks of an operating system is that 

20 of allowing an application program to have a rather 
simplistic view of how data (i.e., data files) are stored 
within a physical storage system. Typically, an application 
program views the physical storage system as containing a 
number of hierarchical* partitions (i.e., directories) within 

25 which entire data files are stored. This simplistic view is 
often referred to as a logical view since most files are not 
really stored as unit bodies into directories but rather are 
broken up into data blocks that may be strewn across the 
entire physical storage system. 

30 The operating system is able to allow an application 

program to have this simplistic logical view with the help 
of a file management system. The file management system 
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stores directory structures, breaks up data files into their 
constituent data blocks, stores the data blocks throughout a 
physical storage system and maintains data logs of where 
every piece of data is stored. Thus, the file management 
system has to be consulted whenever data files are being 
stored or retrieved from storage. 

Computer systems that have a plurality of physical 
storage systems (e.g., servers) use an added layer of 
abstraction when storing and retrieving data. The added 
layer of abstraction is a logical volume manager (LVM) . 
Volume, in this case, is the storage capacity of a physical 
storage system. Thus, volume and physical storage system 
will henceforth be used interchangeably. 

The LVM arranges the physical storage systems into 
volume groups in order to give the impression that storage 
systems having each a much more voluminous storage capacity 
are being used. Within each volume group, one or more 
logical volumes may be defined. Data stored in a logical 
volume appears to be stored contiguously. However in 
actuality, the data may be interspersed into many different 
locations across all the physical. storage systems that make 
up the volume group. 

Stated differently, each logical volume in a logical 
volume group is divided into logical partitions. Likewise, 
each physical volume in a volume group is divided into 
physical partitions. Each logical partition corresponds to 
at least one physical partition. But, although the logical 
partitions in a logical volume are numbered consecutively or 
appear to be contiguous to each other, the physical 
partitions to which they each correspond, need not be 
contiguous to each other. And indeed, most often, the 
physical partitions are not contiguous to each other. Thus, 
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one of the many tasks of the LVM is to keep tabs on the 
location of each physical partition that corresponds to a 
logical partition. 

For fault tolerance and performance, some servers store 
5 at least one extra copy of each piece of data onto the 
physical storage systems they use. For example, if three 
physical storage systems are used, a server may store a copy 
of each piece of data in each physical storage system. 
Storing more than one copy of a piece of data is called 

10 mirroring the data. In order to store mirrored data, each 
logical partition used must correspond to as many physical 
partitions as there are mirrors (or copies) of the data. In 
other words, if the data is mirrored three times, for 
example, each logical partition has to correspond to three 

15 physical partitions. 

The three physical storage systems in the example above 
may be referred to as mirrors of each other. Obviously, 
data may be read from any one of the three mirrors. Several 
methods of reading data from mirrors have been used. In one 

2 0 method, the mirrors are ranked as first, second and third 

and data is always read from the first mirror. In another 
method, data is read from the mirror whose magnetic reading 
head is closest to the data. In yet another method, data is 
read from the mirrors in a round robin fashion. 
25 In some instances, however, these methods may not be 

ideal for reading data from mirrors. For example, in the 
first method, the mirror from which data is always being 
read may become a bottleneck while the other mirrors stay 
idle. Performance of a computer system that uses this 

3 0 method may at times be severely degraded. 

In the second method, one mirror may continually 
service read requests if data to be read is closest to the 
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magnetic head of the mirror. Again, this method may 
adversely affect performance as that particular mirror may 
become a bottleneck. 

in the third method, data to be read may be closer to 
the magnetic head of a mirror that just serviced a read 
request. Nonetheless, a different mirror will be used to 
service the request. This, of course, may adversely affect 
performance . 

To mitigate the adverse performance of the methods 
enumerated above, a fourth method has been used. The fourth 
method uses an algorithm that chooses the least busy of a 
set of mirrors to service a read request. But if all the 
mirrors are equally busy, the first mirror to have become 
busy is used to service the request. In such a case, if a 
plurality of read requests is received and if each piece of 
data requested is located close to the next piece of data, 
different mirrors may be used to service the requests. 
Clearly, it would be advantageous to have one mirror service 

these requests. 

Hence, what is needed is a system, apparatus and method 
of chaining a plurality of read requests such that they are 
issued to one mirror when the locations in which the 
requested data is stored are close to each other. The read 
requests may be issued to the least used mirror in a set of 



mirrors . 
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SUMMARY OF THE INVENTION 



The present invention provides a system and method of 
servicing a plurality of read requests using a common 
mirror. When a plurality of requests is received, it is 
determined whether the amount of data requested by the read 
requests is within a user-configurable threshold. The read 
requests are chained together if the amount of data 
requested by the read requests is within the user- 
configurable threshold. After being chained together, the 
read requests may be sent to the common mirror for 
servicing. The common mirror, in this case, is a least used 
mirror in a set of mirrors. To reduce seek and/or 
rotational time of the common mirror, it may be ascertained 
that the data being requested by the read requests is within 
a user-configurable range before chaining the read requests 
together. In some cases, it may be ascertained that the 
plurality of read requests is to be grouped together before 
the read requests are chained together. 

In a particular embodiment, the system and method of 
the present invention may service a solitary read request 
using a common mirror. Particularly, if the solitary read 
request follows a previous read request and if it is 
determined that the amount of data requested by the first 
and the second read requests is within a user-configurable 
threshold, then the second read request may be sent to the 
same mirror that services the previous read request. Note 
that in this case, the second read request will be sent to 
the mirror that services the previous read request so long 
as the second read request is received while the previous 
read request is being serviced or is received within a user- 
configurable time frame from the previous read request. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

The novel features believed characteristic of the 
invention are set forth in the appended claims. The 
invention itself, however, as well as a preferred mode of 
use, further objectives and advantages thereof, will best be 
understood by reference to the following detailed 
description of an illustrative embodiment when read xn 
conjunction with the accompanying drawings, wherein: 

Fig. 1 is a conceptual view of a data storage 
subsystem. 

Fig. 2 depicts a conceptual view of a map that may be 
used by the logical Volume Manager (LVM) of the present 
invention. 

Fig. 3 is a conceptual view of data stored in a set of 
mirrors . 

Fig. 4 is a flowchart of a process that may be used to 
implement the present invention. 

Fig. 5 is a flowchart of a process that may be used by 
the LVM when a plurality of read requests is received. 

Fig. 6 is a flowchart of a process that may be used by 
the LVM when there is a notification that a plurality of 
requests ought to be grouped together. 

Fig. 7 is a flowchart of a process that may be used by 
the LVM when the read request received is solitary. 

Fig. 8 is an exemplary block diagram of a computer 
system in which the invention may be implemented. 
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DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 

TO better understand the invention, a more detailed 
explanation of the LVM is needed. The LVM interacts wxth 
application programs and the physical storage devices as 
shown in Fig. 1. m Fig. 1 three layers are depicted xn a 
data storage subsystem, an application layer 100, a logxcal 
layer 110 and a physical layer 120 each having one or more 
devices. It should be noted that the devices shown rn the 
three layers are not all inclusive. There may be more 
devices in use in each of the application layer 112, the 
logical layer 110 and the physical layer 130. Thus, the 
devices in Fig. 1 should be taken only as an example of 
devices that may be used in a data storage subsystem. 

The logical layer 110, for all intent and purpose, xs 
the LVM. The LVM may be regarded as being made up of a set 
of operating system cononands, library subroutines or other 
tools that allow a user to establish and control the logical 
volume, storage. The LVM controls the physical storage 
system resources by mapping data between a simple and 
flexible logical view of storage space and the actual 
physical storage system. The LVM does this by using a layer 
of device driver code that runs above traditional devxce 
drivers. This logical view of the disk storage is provided 

-ic i nf^pnendent of the underlying 
to application programs and is indepenaent 

physical disk structure. 

The logical layer 110 contains a logical volume 112 
that interacts with logical volume device driver 114. A 
device driver, as is well known in the art, acts as a 
translator between a device and programs that use the 
device. That is, the device driver accepts generic commands 
from programs and translates them into specialized commands 
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for the device. In this case, the logical volume device 
ariver 114 translates co^ands from an application prograt. 
that may be executing on the computer system for dev.ce 
ariver 130. Thus, when an application program sends 

mo i-o «?tore or retrieve 

5 commands to file system manager 102 to 

aata from logical volume 112, the file system --^er 102 
informs the logical volume manager 112 of the appUcatron 
program.s wish. The logical volume manager 112 then may 
Lnvey the wish to the logical volume device dr.ver 114^ 

,0 The logical volume device driver 114 may -^^^ - 
appropriate map and instruct the device dr.ver 130 whrch 

^ 100 19ZL and 126 to use for 
ones of physical storage systems 122, 124 and 

the data. i^*-^ i-in*=* 

«hen data is mirrored, a map is used to correlate the 
15 logical volume used to the actual physical storage systems 
in Which the data is stored. Oenerally. the map rnclud^s 
the partitions or sectors of the physical storage systems 
that are used and is stored in the LVM. Fig. 2 deprcts a 
conceptual view of an exemplary map. Bata A is stored rn 
location., location, and in location3. of d.s... drs.. and 
disfc. respectively. Likewise, data B is .in loca .on... 
location, and in location,, and data C is in location^., 
location, and in location, of disk. 

respectively. Fig. 3 depicts a logical view of data A. B. 

" °" "LTtioned before, in the past when all three mirrors 
,i e.. disk., disk, and disk,) were busy and recn^ests for 
ata A. B and C were received, the .VM would send o^e 
reguest to the first mirror to have become busy. The other 

30 tw! requests might have been sent to one of the other wo 
mirrors or each one of the other two mrrrors mrght have 
Serviced one of the other two re<^ests. The present 
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invention, however, .nay chain together the three requests 
and have the mirror which has serviced the least number of 
read requests within a user-configurable time frame service 
the chained requests. To do so, however, the recp^ests may 
have to be for data that is within a user-configurable range 
from each other. For example. If the range is configured to 
be two megabytes and if data A, B and C are wlthrn the two 
megabyte range, the revests may be chained together and the 
least used mirror may be used to service the charned 
requests. In so doing, seek and/or rotatronal trme 
associated with reading the data may be reduced. Seek txme 
is the time it takes for the magnetic head or heads of a 
disk drive to move over a sector or sectors of the d.sK 
within Which the data is contained. Rotational trme, on the 
other hand. Is the tlmo it takes for a desired sector to 
move from where it currently is to where it needs to be 

the data to be read. . ■ „ 

in order to ensure that the mirror servicing the 
chained revests does not become too overburdened, a user- 
configurable input/output (I/O) threshold may be used, 
is if the amount of data to be read is within 10 megabytes, 
(the user-configurable I/O threshold) for example, the least 
used mirror may be used to service the chained requests. 
Otherwise, the requests may not be chained together and more 
than one mirror may be used to service the requests. 

Ars nnt-ifv the LVM when 
some application programs do notify 

requests should be logically grouped together. In those 
instances, the revests may be chained together so long as 
they are within the user-configurable I/O threshold. 

TO reduce seek and rotational times, modem disk drives 
provide a read-ahead feature. The read-ahead feature 
enables data that is highly likely to be requested in 
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near future to be pre-fetched. Specifically, since data xs 
generally read in sequence, when a piece of data is read 
from such disks, an algorithm within the disk controller 
microcode may instruct the disk to read data that is 
adjacent to the data being read since it is very likely that 
it may be requested in the near future. This data is then 
cached in a buffer on the disk drive. If the data cached is 
later requested, it may be provided from the buffer instead 
of being read from the disk. Thus, the latency that would 
be due to the seek and/or rotational time may be reduced or 

altogether obviated. 

Note that there may be more advanced read-ahead 
algorithms used instead of the one described above. Thus, 
the described algorithm is used for illustrative purposes 
only. 

The present invention takes advantage of this read- 
ahead feature by sending a second request to a mirror that 
has finished to service a read request so long as the time 
between the first request and the second request is within a 
user-configurable time frame and the data being read is 
within a user-configurable I/O threshold. The user- 
configurable time frame may be set to the average time it 
takes for the buffer to be filled up and for data therein to 
be replaced by new data. As before, the user-configurable 
I/O threshold may be 10 megabytes. 

Further when a second read request is received while a 
first read request is being serviced, if the data to be read 
in response to the second read request is located within a 
user-configurable range from the data being read, then the 
second read request may be sent to the mirror servicing the 
first read request if a user-configurable I/O threshold has 
not been exceeded. In this case, the user-configurable 
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range .ay again be two megabytes and the user-configurable 
I/O threshold 10 megabytes. 

Pig 4 is a flow Chart of a process that may be used by 
the present invention. The process starts when a read 
revest is received .steps 400 and 402, . Then a check .s 
made to determine whether the read request rs one of a 
plurality of read requests being received or rs a soUtary 
: ad revest. If the request is one of a plurality of read 
e^estrbeing received, a chec. may be made to determrne 
Whether there is a notification regarding ^^^J^^ 
requests together. M so, the process rn Frg. 6 may be 

onowed .steps 404, 414 and 41S, . If 
notification, the process in Fig. 5 may be followed .steps 
414 and 416,. If the revest is a soUtary read 
request, a check may be made to determine whether there has 
LI a previous read request received. If *^ ^ 

in Fig. 7 may be followed (steps 404, 406 and 4 2. 
Otherwise, the LVM may proceed as customary .steps 404, 406, 

408 and 410) . . 
Fig 5 is a flowchart of a process that may be used 

^+-^ TO r-^:^ceived. When the 
when a plurality of read requests is receive 

process starts, a check is made to determine whether the 
data requested is within the user-configured range. If not^ 
The I.VM may proceed as customary .steps 500, 302, B04 and 
25 512) If the data being requested is within a user- 
configured range, another check is made to -"^^^ J--" 
the amount of data to be read is within a ^^^J^!^ 
X/o threshold. If not, the LV« may proceed as customary 

cno ..nf; sn4 512). If the amount of data to be 
(steps 502, 506, bU4, d-l^; • , j 

30 ad is within the user-configured I/O — ^ "f^ 

requests may be chained together and sent to the least used 



15 



20 



10 



- 12 - 

Docket NO. AUS920030526US1 

^ -Kofore the process ends (steps 
mirror of a set of mirrors before tne p 

506, 508, 510 and 512). 

Fia 6 is a flowchart of a process that may 

determine whether the data to be reaa 
made to determx ,f the LVM may proceed 

user-configured I/O threshold 

^..hoT^c; 600 602, 604 and 610). 
as custonary steps ^ ^^^^ threshold, the 

be read is within the user contig 

revests chained --*-^::';-;rrce .steps 

mirror o£ the set of mrrrors before the p 

602, 606, 608 and 610). 

Pi,. . is a flowchart of a J^^^ ,,,, ,3 

„hen the read revest received is soUtarr^ ^^^^ 
„ue, a checK is .ade to determine wh ther t P ^^^^^ 
.e<^est is still revested hy the 

made to determine ''^^^^^^^2\T:J;J^,,,^^^, lan.e from 
second read request is within the 

.he data requested hy the P--^^/^ ,06, 
the LVM may proceed as customary (steps 700, 

„ the data being revested by the second read request 
. the user-confi..ed ^^^^^ 

by the previous read ---- both read 

to determine ^^^^^l^^^^^^^^^^^^, ,,0 threshold. If 
requests is withm the user conr g 
„.t, the .V. .y proceed as^stc^ry^s.^^^^^^^^ 

-790 ^ If the amount of data rrom 

■ nfiaured I/O threshold, the second request 

within the user-configured i/u t ™vious 
may be processed by the same mirror servicing the prev 
read request (steps 712, 716 and 720). 
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4- Still being serviced 
« *e p.e.ious reaa -J--' ^ ^ ,,,,, ^ 

:„aae to determine whether the previously 
„..hin a ri ::.. X. so, the 

— 3:7-3 Othlise, the process .»ps to 

process jumps to step 71^. 

step 706 (Steps 702, 704 and 706). , aata 

o is a block diagram illustrating 

^hP oresent invention may be 
processing system in which the pres ^^^^^^^ ^ 

nai-a processing system 
implemented. Daca ^ jpci) local bus 

— co„t int— t ^^^^^ 

architecture. Althougn Accelerated Graphics 

... other .us jnii tecture (IS. .a. he 

(-^. ana 304 are oonnectea to 

, used. processor 802 and mai 

iooai hus — ::r:::a:r :»"-er ana oaohe 

also .ay rncluae an xntegr ,,„„ections to PCI 

memory for processor 802. ^.^^^^ component 

local bus 806 r^y be made th J ^^^.^^^^ 
0 interconnection or through add-in ' ^^^^ 

example, local area networ. -^^^ ^^^^^ .^..^ed 

.aapter 81., ana -ns^ ^ ^^^^^^^ ..nnection. In 

„ PCI local bus 806 by di ^^^^ 
co-rast, audio -ap^ 816, jrap ^^^^^ 

audio/video ^^^'^ ^^J^ J^,;: .^.^..on slots. Expansion 
by aad-in boards inserted int keyboard and 

- interface B"/"--^;;":: litional memory 3... 
mouse aaapter 820, modem 822 a ^^^^^^^ 
small computer system inter - ,^SI, ^^^^ ^^^^ ^^^^^ 

30 provides a connection for hara a ^^^^^ 
828, ana CD-ROM drive 830. 
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*- i-y^r-^e or four PCI expansion 
implementations will support three or 

slots or add-in connectors. ^^^^ 
An Operating system runs on processor 

An operati g y ^,„^rol of various components 

to coordinate and provide control of 
. ^- A nrocessing system 800 in Fig. 8. The opera 
5 wxthrn data processing y operating system, 

4.^™ mav be a commercially avaiiau-Le 
system may be a „,ilable from Microsoft 

such as Windows XP, wfticn obieot 
corporation or MX. wbic. is » /"^^t::^ ' , J 

oriented progra^nin. system such as .avaj^y 

executing on data processing system 800. Ja 
execuciny Instructions for the 

trademark of Sun Microsystems, Inc. 

^^n, ^he obi ect- oriented operating syscem, 
operating system, the ob^e ^^^ention are 

15 applications or programs ^ ,,,,3 826, and 

located on storage devices, such as h d ^ 
raay be loaded into main memory 804 

processor 802. ^-^n the art will appreciate that 

Those of ordinary skill m the art w y 

in Fig 8 may vary depending on the 
,0 the hardware in Fig^ ^^^^^^^^ 

implementation. Other ^^^^^^^^^ nonvolatile 

devices, such as flash R ^^^^ 
memory, or ----^e ^rd^ ^ictel in .i. S. 

. rsriroL::: :rt.e present invention may be appiied 

T^nr^roressor data processing system, 
to a multiprocessui >-icl t- = = been 

. ^ nresent invention has oeeii 

.he description ot - aescription, and 

presented for purposes of Ulustratr invention 
is not intended to be exhaustive i 
30 in the form disclosed. Many -^^^^^^^ 

will be apparent to those of ordinary ^-^^^ -^^^ 
embodiment was chosen and described in order to 
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..e principles of the invention, the practical -pUc-^-. 
and to enable others of ordinary skrll .n the art to 

• f^T- xrarious embodiments with 
understand the invention for various e 

various modifications as are suited to the particular 
5 contemplated. 



